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^ , Nonlinear expectation, including sublinear expectation as its spe- 

cial case, is a new and original framework of probability theory and 
has potential applications in some scientific fields, especially in fi- 
H : nance nsk measure and management. Under the nonlmeax expeeta- 

tion framework, however, the related statistical models and statistical 
inferences have not yet been well established. The goal of this paper 
is to construct the sublinear expectation regression and investigate its 
statistical inference. First, a sublinear expectation linear regression is 
defined and its identifiability is given. Then, based on the represen- 
>^ I tation theorem of sublinear expectation and the newly defined model, 

•^ I several parameter estimations and model predictions are suggested, 

^T") , the asymptotic normality of estimations and the mini-max property 

T^ ' of predictions are obtained. Furthermore, new methods are developed 

o' 
m 

ulation studies and a real-life example are carried out to illustrate the 
new models and methodologies. All notions and methodologies devel- 
S^ I oped are essentially different from classical ones and can be thought 

» I of as a foundation for general nonlinear expectation statistics. 
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1 Introduction 

Among all the assumption conditions imposed to classical statistical mod- 
els, the most vital one may be that the model under study has a certain 
probability distribution that may or may not be known. The classical linear 
expectation and determinant statistics are built on the distribution-certainty 
or model-certainty. The distribution-certainty, however, is not always the 
case in practice, such as risk measure and super-hedging in finance. For re- 
lated references see, e.g.. El Karoui, Peng and Quenez (1997), Artzner, Del- 
baen, Eber and Heath (1999), Chen and Epstein (2002), Follmer and Schied 
(2004). We also studied a relevant practical problem. It is known that in a 
financial market, non-performing loan (NPL) is always an important object 
to be monitored. The NPL ratio is of course related to some economic indi- 
cators such as loan-deposit ratio and capital adequacy ratio. We have used 
an indicator set and the corresponding data published in Vendors Database 
of China (2000-2010) to establish a regression relationship between the NPL 
ratio and the indicators in the set. It has been discovered that the regression 
error has a mean-uncertainty, meanly, the error mean is distributed in an 
interval [—0.1833,0.1747]. We will discuss the issue in detail in Section 5. 

Without distribution-certainty, the resulting expectation is nonlinear usu- 
ally. The earlier works on nonlinear expectation may ascend to Huber (1981) 
in the sense of robust statistics or ascend to Walley (1991) in the sense of 
imprecise probabilities. In the recent decades, the theory and methodology 
of nonlinear expectation have been well developed and received much atten- 
tion in some application fields such as finance risk measure and control. A 
typical example of the nonlinear expectation, called (^-expectation (small g), 
was introduced in Peng (1997) in the framework of backward stochastic dif- 
ferential equations. As a further development, G-expectation (big g) and its 
related versions are proposed by Peng (2006). Under the nonlinear expec- 



tation framework, the most common distribution is the so-called G-normal 
distribution, which was first introduced in Peng (2006). Furthermore, as a 
theoretical basis of the nonlinear expectation, the law of large numbers as 
well as the central limit theorem were also established by Peng (2008 and 
2009). Also, from different points of view, many authors studied nonlinear 
expectation, see, e.g., Denis and Martini (2006), Denis et al. (2011), Soner et 
al. (2011a, 2011b, 2012 and 2013). Other references include Chen and Peng 
(2000), Briand et al. (2000), Coquet et al. (2002), Gao (2009), Li and Peng 
(2011), Peng (1999, 2004, 2005 and 2009), Rosazza (2006), Song (2012), and 
Xu and Zhang (2009), among many others. 

Contrary to the fast development of the nonlinear expectation in proba- 
bility theory, little attention was paid to the related statistical models and 
statistical inferences to the best of our knowledge. Although the earlier 
work of Huber (1981) refers initially to a upper-lower expectation, a spe- 
cial nonlinear expectation, the main aspects focus on robust statistics and 
the underlying true model is supposed implicitly to have a certain distribu- 
tion. Gross error model, for example, contains a certain true distribution in 
the contaminated distribution set, and based on such a distribution set, the 
supper-lower expectation can be defined; see, e.g., Strassen (1964) and Hu- 
ber (1981). In classical statistical frameworks, the heteroscedastic model may 
be the closest one to the model-uncertainty aforementioned, but it only has 
variance-uncertainty and the corresponding inference methods do not involve 
any notion of nonlinear expectation. In nonparametric framework, the model 
structure is not specified, and in Bayesian framework, the model parameter 
is random. But the two statistical frameworks are essentially different from 
the model-uncertainty aforementioned and the corresponding methods are 
completely unrelated to any nonlinear expectation. In time series models, 
although the data depend on observation time, strict stationarity or weak 



stationarity is required to guarantee the certainty of statistical inferences. In 
a word, under the classical statistical framework, including parameter mod- 
els, nonparametric models, Bayes models and time series models, the defined 
expectations are of linearity. Without this linearity, it is essentially difficult 
or impossible by using classical methods to achieve classical certain conclu- 
sions, such as the identifiability of model parameter, estimation consistency, 
asymptotic normality of the estimation and model selection consistency. 

Under model-uncertainty frameworks, the classical statistics methods may 
no longer be available. The classical maximum likelihood, for example, is 
nonexistent or can not be uniquely determined due to without a certain like- 
lihood function. Also the classical least squares estimation is invalid because 
the parameter is defined via linear expectation. Moreover, the classical sta- 
tistical models such as linear regression models, may not be well-defined as 
their identifiability depends on mean-certainty; without mean-certainty, the 
regression function is unidentifiable. Furthermore, it will be verified by sim- 
ulations in Section 5 that under the situation of model-uncertainty, usual 
methods may not work and even collapse nearly. Thus, to achieve the target 
of statistical inference, it is necessary to develop new statistical frameworks 
and new statistical methods. 

The main contribution of our paper is to establish a framework of sublin- 
ear expectation regression for the model that has the distribution-uncertainty. 
Based on a sublinear expectation space, a sublinear expectation linear regres- 
sion is defined and its identifiability is achieved. Our model is always avail- 
able for the cases of variance-uncertainty and/or mean-uncertainty. Unlike 
classical regression, the new model tends to use a large value to predict the 
response variable and obtains the mini-max prediction risk. It implies that 
our method is a robust strategy and has potential applications in finance risk 
measure and management. Based on the representation theorem of sublin- 



ear expectation, new parameter estimation methods are suggested and the 
resulting estimators are asymptotically normal distributed for the case of 
high-frequency data. Finally, our method is extended to variable selection 
for high- dimensional regression. It is worth mentioning that under model- 
uncertainty framework, certainty-statistical inferences are established in this 
paper, including parameter-certainty, prediction-certainty and distribution- 
certainty of parameter estimation. The notions and methodologies developed 
here are nonclassical and original, and the theoretical framework establishes 
the foundations for general nonlinear expectation statistics. 

The remainder of the paper is organized in the following way. In Section 
2, a sublinear regression model is built and its identifiability is obtained. 
The estimation and prediction methods are suggested in Section 3. Also the 
asymptotic normality of estimators and the mini-max property of predictions 
are established in this section. The method is extended to variable selection 
for high-dimensional model in Section 4. Simulation studies and a real- 
life example are carried out in Section 5 to illustrate the new model and 
methodology. The proofs of the theorems and the definition of the sublinear 
expectation space are postponed to Appendix. 

2 Sublinear expectation regression 

In this section we establish a framework of sublinear expectation regression, 
including modeling, estimation, prediction and asjTiiptotic properties. 

2.1 Model 

We consider the following linear regression model: 

r = /3'x + £, (2.1) 



where F is a scalar response variable, x = (Xi, ■ ■ ■ ,Xp)' is the associ- 
ated p-dimensional covariate having a certain distribution F^{x), and /3 = 
(/3i, ■ ■ ■ , /3p)' is a p-dimensional vector of unknown parameters. Furthermore, 
it is supposed that the error e is independent of x. We need the indepen- 
dence condition only for simplicity. The idea and methodology developed 
below can be extended to the dependent case, but the notations and al- 
gorithm are relatively complex. It is worth pointing out that the essential 
difference from the classical regression model is that here the error e has 
distribution-uncertainty, which is defined in the following way. 

Let r2 be a given set and "H be a linear space of real valued functions 
defined on Q. Furthermore, let E denote a sublinear expectation: "H — )■ 
M, satisfying monotonicity, constant preserving, sub-additivity and positive 
homogeneity; for the details of the definitions see Appendix. The triple 
(fi, "H, E) is then called a sublinear expectation space. In this paper, we 
assume that the random variable e is defined on a sublinear expectation space 
{Q, H, E). It can be seen from the definition that the probability distribution 
of e is uncertain. Under this situation, the independence between x and e 
mentioned above is defined in the sublinear expectation space, which is a 
weak independence (2008 and 2009). For regression analysis, we suppose 
that Ti contains linear and quadratic functions, and although the sublinear 
expectation E is supposed to be existent, its exact form may be unknown. 
Thus, a remarkable point of view is that since regression analysis depends 
mainly on "expectation", we here only define a sublinear expectation space, 
instead of the well-accepted linear expectation. 

By the representation theorem of sublinear expectation (Peng 2008 and 
2009), the sublinear expectation of a function g{e) E Ti can be expressed as 
a supremum of linear expectations. Formally, there exists a family of linear 



expectations {Ef : / G J-"} defined on "H such that 

E[g{e)]=snpEf[g{e)] ioigen, (2.2) 

and there exists a fg E T such that 

ngie)] = EjM^)]. (2.3) 

Write 

71 = E[e], fi = -E[-s], a^ = E[e% q^ = -E[-e'^]. 

Then, the intervals [/U.,/7] and [a^,^^] characterize the mean-uncertainty and 
the variance-uncertainty of e, respectively. 

When X is a random variable, for regression modehng, it is necessary to 
clarify the subhnear expectation E[y|x] conditional on x since the nonlinear 
conditional expectation has not yet been defined in the existing literature. 
Actually, however, there is no obstacle to extend the nonlinear unconditional 
expectation to the nonlinear conditional expectation. By the representation 
theorem given above, for instance, the above E[y|x] can be defined as 

E[y|x]= sup i?/„jy|x], 

/y|xG-^y|x 

where {Ef^ : /y|x € J-'y\x.} is a family of conditional linear expectations. 
With this definition, the properties of monotonicity, constant preserving, 
sub-additivity and positive homogeneity given in Appendix still hold. 

Finally, we should note that it was assumed above that the covariate 
vector x has a certain distribution F^ and the intercept term of model 02. ip 
is zero. Here we need the distribution-certainty of x to guarantee that the 
regression coefficient vector /3 is identifiable; otherwise, when both e and x do 
not have the distribution-certainty, /3 can not be uniquely determined. For 
details see Remark 2.1 below. The assumption on x and e aforementioned is 



a practical condition. For example, if F is a measure of a financial risk and 
X is the set of the corresponding economic indicators, then, usually the goal 
of regression analysis is to describe the risk measure Y for a given economic 
indicator set x. Therefore, the indicator elements of x could be regarded as 
of distribution-certainty exactly or approximately. In this case, the model- 
uncertainty is derived from the unstable financial environments that can be 
grouped in the model error e. On the other hand, we need the zero intercept 
to eliminate the estimation bias; without it, the estimation is inconsistent. 
For details see Remark 3.2 below. 

2.2 C-normal regression 

We first consider the case when the error e is supposed to be G-normally 
distributed, namely, 

er^Af = N{{0}x[a^,a^]). (2.4) 

Under this situation, e has a certain zero mean but its variance is uncertain, a 
special distribution-uncertainty. As was defined by Peng (2006), e is called G- 
normally distributed if it is defined on a sublinear expectation space {fl, Ti, E) 
and satisfies that for each a,b > 0, 

ae + be = Va'^ + b'^ e, 

where e is an independent copy of e and "=" stands for equal in distribution. 
For the definition and the representation of G-normal distribution see Peng 
(2006). It follows from the cash translatability of sublinear expectation given 
in Appendix that for regression model (12. ip . if e is G-normally distributed 
as in (12^ . then 

E[r|x] = /3 X. (2.5) 



The above relationship fl2.5p could be thought of as a G-normal expecta- 
tion regression because E is the G-normal expectation, a special sublinear 
expectation. Note that x has an identical distribution. Then, we have the 
following conclusion. 

Proposition 2.1 (1) If e is G-normally distributed as in l[2.4\) , then, the 
G-expectation of Y is identifiable in the sense that K{Y\x.) can be uniquely 
determined by fi'^ as in Ii2.5\) . (2) Besides the condition above, if E[kx.'] is 
a positive definite matrix, then, f3 is identifiable in the sense that (3 can be 
uniquely determined as 

f3 = (E[xx'])-^E{xE[r|x]}, (2.6) 

where linear expectation E is taken under the certain distribution F^{x). 

The proof is given in Appendix. For the proposition, we have the following 
remark. 

Remark 2.1 

(1) The proposition implies that if the error e is G-normally distributed 
and X has the distribution-certainty, then G-normal regression has both 
regression function-certainty and regression coefficient-certainty. The 
conclusion provides a theoretical basis for regression analysis such as 
parameter estimation and model prediction. 

(2) From the proof of the proposition we can see that if x does not have the 
distribution-certainty but only a sublinear expectation is defined for x, 
/3 can not be uniquely determined usually. Without the identifiability 
of P, there is no sense in modeling regression relationship. 

(3) Here we emphasize the use of G-normal regression because a quadratic 
loss function will be employed below to construct a "quasi maximum 
likelihood" estimation; for details see the next section. In fact the no- 
tion proposed here can be directly extended to general mean-certainty 
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sublinear expectation regressions. Specifically, we only assume e has the 
mean-certainty, instead of G-normal distribution. Under this situation, 
model (12. 5p could be regarded as a mean-certainty sublinear expecta- 
tion regression. With the point of view, the conclusions in Proposition 
2.1 still hold. 

2.3 Sublinear expectation regression 

Now we investigate the model in which the error e is mean-uncertain and 
variance-uncertain. By the cash translatability of the sublinear expectation 
given in Appendix, we have 

E[y|x] = /3'x + 7I. (2.7) 

This model could be thought of as a sublinear expectation regression be- 
cause E is a sublinear expectation. By (12. 7p and similar arguments used in 
Proposition 2.1, we have the following conclusion. 

Proposition 2.2 (1) If fi < JI, then, given x, the sublinear expectation 
of Y has a shift JI, more precisely, the sublinear expectation of Y has the 
framework of \2.1^ . (2) Besides the condition above, if E\x.'x!] is a positive 
definite matrix, then, /3 is identifiable, more precisely, /3 can be uniquely 
expressed by 

13 = (^[xx'])-^^{xE[r|x]} - /i(E[xx'])-^^[x]. (2.8) 

Particularly, if E[^ = 0, then 

(3 = (E[xx'])-iE{xE[F|x]}. (2.9) 

The proof is also presented in Appendix. From the proposition, we have 
the following findings. 
Remark 2.2 
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(1) The conclusions in the proposition are somewhat surprising because 
they suggest a nonclassical point of view and provide a methodological 
development. That is to say, in the face of mean- uncertainty, we can 
still uniquely determine the parameter vector /3 and then use the mean- 
shift framework /J'x-!-/!, instead of /3'x, to predict the response variable 
Y. Such a framework reflects the robust feature of sublinear expecta- 
tion regression. If F is a measure of the risk of a financial product, 
then the sublinear expectation regression tends to use a relatively large 
value to predict risk and moreover, and the increment of risk measure 
is just the sublinear expectation Ji of the error e. 

(2) It is worth mentioning that when the model does not have the mean- 
certainty, the representation (12.81) of regression coefficient vector /3 is 
different from the representation in (12.61) for the mean-certainty model, 
in other words, the representation (12. 6p in the mean-certainty model 
is a special case of the representation (12.81) with /I = 0. This is an 
essential feature of sublinear expectation regression, i.e., in the mean- 
uncertainty framework, the regression coefficient vector /3 depends on 
the nonlinear expectation of error e. Such a feature is totally different 
from classical linear expectation regression because in the linear expec- 
tation regression framework, the regression coefficient vector /3 has an 
error-free representation as 

(3 = (E[xx'])-iE[xF]. 

On the other hand, when -^[x] =0, the parameter representation in (12. 9p 
is free of /I. In the following, we mainly focus on the parameter representation 
in (12. 9p because we will see that without JI, the corresponding estimator of 
P is relatively simple and is asymptotically unbiased. 
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3 Estimation and prediction 

It is supposed in this section that the dimension p of /3 is fixed. Let {(Yi, Xj : 
^ = 1, ■ ■ ■ , A^} be a sample from model (12. ip . satisfying 

Yi = (3'^, + ei, i = l,--- ,N. (3.1) 

Unlike the classical ones, here Yi,- ■ ■ , Y^ may have distribution-uncertainty 
due to the distribution-uncertainty of ei, ■ ■ ■ ,ei\i- Then the corresponding 
estimation method should be different from the classical ones that only apply 
to linear expectation regression models. 

It seems that we can use f l2.9p to construct the estimator of f3 as it presents 
a closed expression for /3. However, in the expression, E[y|x] is a sublinear 
conditional expectation, like the classical ones, its estimation does involve 
multivariate nonparametric methods and therefore faces the curse of dimen- 
sionality if the dimension p of x is high. To avoid the problem, we now 
introduce a mini- max method to construct the estimator of /3. 

Case 1. We first consider the case of e having the mean-certainty. Be- 
cause Y has the sublinear expectation /3'x given x, theoretically, we should 
choose f3 so that it can minimize the sublinear expectation loss: 

E [{Y - /3'x)2] . (3.2) 

We can easily verify that the above sublinear expectation loss is a convex 
function function of (3. Thus the optimization problem has a unique global 
optimal solution. The above is in fact a sublinear expectation least squares. 
It is worth mentioning that under G-normal distribution, we have that if (p 
is a convex function, then 

E[^(e)] = -1= / v{(Tu) exp <^ -— 2 
V27r J-oo (. ^^ 

and if (y9 is a concave function, then 



1 /""^ { V? \ 

V27r i_oo I 2(T^ J 
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For details refer to Peng (2006). These imply that under the convex function 
and concave function spaces, the G-normal has density functions i-_ exp 
and J- exp < — ^ k respectively. Therefore, the above sublinear expecta- 
tion least squares could be thought of as a "quasi maximum likelihood" . 

To implement the estimation procedure, we need the following assump- 
tion: 

CI. There exists an index decomposition: Jj, z = 1, ■ ■ ■ , m, such that when 
(ij) G Jj, Eii, ■ ■ ■ ,ein^ are independent and have an identical distribu- 
tion. 

This condition is essentially implied in the conclusion (12. 2p of the represen- 
tation theorem given in Subsection 2.1. Thus, m should be equal to the 
number of functions in J-" if J-" only contains finite number of functions; oth- 
erwise, m should tend to infinity and in this case, the condition CI is only 
an approximation of the true one. We will further weaken CI and suggest 
a data-driven decomposition after Theorem 3.1 given below. From now on 
we suppose that the numbers of elements in Jj, z = 1, ■ ■ ■ , ?Ti, are equal, i.e., 
rii = n2 ■ ■ ■ = Um = n, without loss of generality. Because it is assumed that 
^ii; ■ ■ ■ ;^m; afG identically distributed, the independence in condition CI is 
the same as that in linear expectation framework, instead of the independence 
in the nonlinear expectation. Here we need independence only for simplicity. 
Without the independence assumption, for example, en,- ■ ■ ,ein are weakly 
dependent, the conclusions given below still hold; for weakly dependent pro- 
cesses and the properties of estimation see for example Rosenblatt (1956, 
1970), Kolmogorov and Rozanov (1960), Bradley and Bryc (1985), and Lu 
and Lin (1997). Furthermore, a common decomposition is built according 
to the observation time order, more precisely, ^i, ■ ■ ■ ,ei\j are reindexed as 
Eij = e(^i-i)n+j,i = i,- ■ ■ ,'m,j = ^,- ■ ■ ,n, and then the index sets Jj's are 
defined as /j = {{ij) : j = 1, ■ ■ ■ ,n}. It is known that in a small time 

13 



«2 
2^ 



max 

l<i<m n 



interval, the characteristic of data could be regarded as to be changeless ex- 
actly or approximately. Under this point of view, condition CI is relatively 
mild. Also we can decompose the index set according to the values of Y in 
a descending order for example. 

Denote by Fi the common distribution function of Eij, (ij) G /«. Accord- 
ing to the representation theorem of sublinear expectation given in (12.21) . 
sublinear expectation loss f l3.2p can be written as max Ep.[{Y — /3'x)^] and 

l<i<m 

therefore its empirical version is 

1 " 

-J^lYv - P'^v? ■ (3.3) 

By minimizing (13. 3p . we obtain a mini- max estimator of /3 as 

1 " 
Pg = argmin max — > lYn — /3'xj,l . (3.4) 

/3eB l<i<m n ^-^ ^ J Ji 

It can be easily verified that max - X]?=i \^ij ~ P'^ij] is a convex function 
function of (3. Thus the resulting estimator /3g is a unique global optimal 
solution in the above optimization problem. Furthermore, such an estimation 
procedure can be easily implemented via, for example, genetic algorithm. 
Denote af = E{ef,) for {ij) G h and af = max af, and for simplicity, 

* l<i<m 

assume that 

af^ > a'f for all i ^ i^. 

The mini-max estimator above is asymptotically normally distributed. The 
following theorem gives the details. 

Theorem 3.1 For the mean- certainty model, if condition CI holds and 
i?[xx'] is a positive definite matrix and n —}■ oo as N ^ oo, then 

V^0G -P)^N (0, aliE[^^'])-') (N ^ oo), 

where — > stands for convergence in distribution and N (O, af^{E[xx'])~^^ is 
a classical normal distribution. 
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This theorem estabhshes the theoretical foundation for further statistical 
inferences such as constructing confidence intervals and test statistics. From 
the proof of the theorem given in Appendix we can see that condition CI 
can be replaced by the following relatively weak condition: 

CV. Sj^i, ■ ■ ■ ,ei^n are independent and have an identical distribution. 

This condition only involves the errors with indexes in /j, . Thus it is relatively 
common and is implied in (12. 3p . the second conclusion of the representation 
theorem. However, recognizing the fact that the number n of data in each 
small time slice /j should be relative large, condition CI or CI ' only applies 
to the case of high-frequency data. Moreover, by the two conditions, it 
is implicitly assumed that the index compositions /j,z = I,--- ,m, or /j^ 
are known completely. Under some situations, however, it is difficult or 
impossible to get such exact compositions in advance. Thus, data-driven 
decompositions are desired in practice. Now we briefly discuss this issue. By 
condition Cl\ the proof of Theorem 3.1 and (12. 3p . the mini-max estimator 
in (13. 4p can be approximately recasted as 

1 " 
/3g = arg mm - ^ \Yi,j - /3'xi^j]^ . (3.5) 

i=i 
Thus, a simple approach is to identify Jj, or its subset. Let /° = {{ij) : 
j = 1, ■ ■ ■ ,n^}, i = 1, ■ ■ ■ , m^, be the initial compositions according to the 
observation time order for example, where n^ > p. Note that in the case of 
mean-certainty, the common LS estimator I3ls of /3 is consistent. We then 
arrange X]j=i(^«j ~ (^Ls^ij)"^^ i = 1,- ■ ■ , m°, in the descending order as 

j=l j=l j=l 

From (12. 3p we can see that when n° is relatively small, the index set I^^ = 
{{hj) : j = 1; ■ ■ ■ ) "'^"l can be chosen as an initial choice of Jj, or a subset of 
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Jj,. We then use the data in 7^? , together with approximate formula fl3.5p . to 
build the estimator. Since the data size in J° may be small, it is necessary to 
enlarge the initial choice 1^° . To this end, we consider the following hypothesis 
test: 

Ho : (tI = vl -^ Hi: aj < vl, 

where cr| is the supposed variance of ei^j for (z2J) G 1° = {(^2J) : j = 
1, ■ ■ ■ ,72°} and vf = ;^o^ X]j=i(^nj ~ f^LS^hj)'^ ■ Classical methods can used 
to test the hypothesis Hq. If Hq is not rejected, then J? IJ J° could be chosen 
as an enlarged choice of /,, . The procedure is repeated until the remainder 
variances are significantly smaller than vf. Also we can use cluster analysis 
and/or discriminant analysis to achieve this goal. 

After the estimator (3g is obtained, a natural prediction of Y is 

Y = P'c^. (3.6) 

If model-uncertainty is ignored and common least squares (LS) method is 
used to construct the estimator Pls of /3, then the LS-based prediction is 

Y = /3l^x. (3.7) 

Comparing the two estimators by maximum prediction risk and average pre- 
diction risk, we have the following conclusion. 

Theorem 3.2 Under the condition of the mean- certainty, whether the 
variance-uncertainty exists or not, the following relationships always hold: 



I E V^^ - /^gX.,] ' < max^ ^ E [y., - K 



LS-^^J 



max 

l<i<m n - 

i=i " " - - ■- j=i 

^ m ^ n r, -t m -, n 



) 



m ': — ' n 



m ^ — ' n 



2=1 j=l i=l j=l 

From the theorem, we have the following finding. 
Remark 3.1 
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The theorem indicates that subhnear expectation regression is a ro- 
bust strategy that can reduce maximum prediction risk. Thus, it can 
be expected that such a regression could be useful for measuring and 
controlling financial risks. 

Case 2. We now consider the case of e having both the mean-uncertainty 
and the variance-uncertainty. In this case Y has the sublinear expectation 
/3'x + /i given x. Theoretically, we should choose /3 so that it can minimize 
the sublinear expectation loss: 

E[(F-/3'x-7i)2]. (3.8) 

However, we cannot directly implement the estimation procedure as JI is 
unknown usually. We thus design a profile estimation procedure as follows. 
Let /3 be an initial estimator of (3, which may be the estimator obtained in 
Case 1 or by common least squares. When -^[x] = 0, Proposition 2.1 and 
Proposition 2.2 show that the regression coefficient vectors in Case 1 and 
Case 2 are equal to each other and thus such an initial estimator is also 
consistent for Case 2. We then estimate /I by 



1 " 
u = max — y Yii — p Xj 
l<i<m n ^-^ L 
i=i 

and finally estimate /9 by 

1 "■ 

Bq = argmin max — > W^ — /3'xj,- — ul . (3.9) 

nan ^<i<m. -n ^-^ ^ j j > \ ^ ' 



/3GB \<i<m n 



\v^ 



Denote /ij = E\ei^, af = E^Sij — /ij)^, vf = af + (jl — HiY and v1^ = maxj? 
i = 1, ■ ■ ■ ,m}, and for simplicity, assume vl^ > vf for all i ^ k^. By the 
same argument as that in Theorem 3.1, we can prove that the estimator Pg 
is asymptotically normal distributed. The following theorem presents the 
details. 
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Theorem 3.3 For mean-variance-uncertainty, if condition CI holds, E\x\ 
and £'[xx'] is a positive definite matrix and n — )■ cx) as N ^ cx), then 



v^(/3g -P)^N (0, <(E[xx'])-i) {N ^ oo). 

For proof of the theorem see Appendix. This theorem estabhshes a foun- 
dation for further statistical inferences and data analyses. Here we also need 
to check the condition CI. From the estimation procedure given above, we see 
that it is asymptotically equivalent to determine two index sets, in which the 
mean of the error and ^ 'YTj=i0^k,j — P'^k^j ^V)"^ achieve the maximum values 
JI and w^^, respectively. The approaches are similar to those used in Case 1 
and thus details are omitted here. On the other hand, it is worth pointing 
out that under the situation of mean-certainty, the condition -Efx] =0 is vital 
for estimation consistency. The following remark will explain its importance. 

Remark 3.2 

For a model that has the mean- variance- uncertainty, if -Efx] ^ 0, then, 
by the relationship between (12. 8p and (12. 9p . we can prove the estimator 
JI oiJI has an asymptotic bias: —'pE[x']{E[x.x.'])^^ E[:s]. As a result, if 
E[k] y^ 0, hj the same argument as that used in the proof of Theorem 
3.3, it can be verified that the estimator Pg has an asymptotic bias as 

bias(^G) = (c/I-/ifcJ(E[xx'])-iE[x], 

where c = 1 — £'[x'](£'[xx'])^-'^£'[x]. Furthermore, without -^[x] = 0, 
the bias-correction is essentially difficult because, under the model- 
uncertainty framework, the law of large numbers can not strictly de- 
termine the consistency of sample mean; see Peng (2007 and 2008). 
On the other hand, the condition -E[x] =0 induces that the intercept 
term in model f)2.1)) should be zero, which implies that if the intercept 
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is nonzero, the estimation bias can not be completely eliminated and 
thus the estimator is inconsistent. 

With the estimator, a natural prediction of Y is 

r = /3^x + ^. (3.10) 

Similar to the properties in Theorem 3.2, the prediction Y can obtain the 
mini-max prediction risk. 

Theorem 3.4 Whether or not the mean-uncertainty and the variance- 
uncertainty exist, the following relationship always holds: 



2 



1 " 2 1 " 



It shows that our proposal is a robust strategy and is therefore useful for 
measuring and controlling financial risk. Meanwhile, the simulation study 
given in Section 5 will verify that when model has mean- variance-uncertainty, 
the average prediction error of the new method is usually smaller that of the 
LS method, namely. 



m ^ n ^ ^ m ^ n 



1 V^ 1 

m 



2 1 v^ 1 ^ ^ ^ 

1=1 '~ j=l "" i=l '~ j=l 



JItX [>-.. - A--, -f^] <7;iY.tX [i« - « 



It is because the prediction bias of P'ls^ is between /i and fi, which is not 
ignorable, especial for the case oi fj,JI > 0. 

4 Variable selection 

In this section we focus on the case when the dimension p = p^ tends to 
infinity as sample size A^ increases. Under this situation, model (12. ip is 
further supposed to be sparse in the sense that only d components Pi^.,k = 
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1, ■ ■ ■ ,d, are nonzero with d <t^ N. Without loss of generahty, it is assumed 
that the first d coefficients /3i, ■ ■ ■ , (3^ are nonzero. 

Note that under subhnear expectation framework, the identifiabihty the- 
ory about P and E[y|x] given in Proposition 2.1 and Proposition 2.2 is free 
of the dimension p. Thus, for high-dimensional model, the conclusions in 
Proposition 2.1 and Proposition 2.2 still hold. With the identifiabihty, we 
can investigate variable selection, parameter estimation and model predic- 
tion under sublinear expectation framework. For simplicity, we only use the 
LASSO (Tibshirani (1996) and Zou (2006)) to achieve our goals. The method 
developed below can be extended to other penalty methods such as SCAD 
(Fan and Li (2001) Fan and Peng (2004)) and Dantzig selector (Candes and 
Tao (2007)). 

We first consider the case of e having the mean-certainty. The theoretical 
objective function is defined as 

E[{Y-/3'^f]+\J2m, (4.1) 

fc=i 

where A > is a tuning parameter, which controls the amount of regulariza- 

tion applied to the estimate. Under condition CI, the empirical version of 

the above objective function is 

-, n p 

max - Y, [y^, - /5'x.,f + A J^ |/3.|. (4.2) 

l<i<rn n '—^' '—^' 

0=1 k=\ 

By minimizing (14. 2p . we can achieve the goals of variable selection and pa- 
rameter estimation simultaneously. It can be verified easily that the objective 
function above is a convex function of /3. Then, the global minimum solution 
exist uniquely. Furthermore, such an optimization procedure can be easily 
implemented via, for example, genetic algorithm. Denoted by /3g the solu- 
tion of the optimization problem (14. 2p . Note that most components of /3g 
are shrunk to zero by choosing a suitable tuning parameter A. Then, the goal 
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of variable selection can be realized. After variable selection and parameter 
estimation being completed, a natural prediction of Y can be chosen as 

Y = /3^x. (4.3) 

Similar to the arguments in Theorem 3.2, our method is a robust strategy 
because the selected model can reduce the maximum prediction risk. Thus, 
the selected model by sublinear expectation can be employed to measure and 
control financial risks. 

From the proof of Theorem 3.1, we see that when n is large enough, the 
term of order Op(l/n) can be ignored and the objective function above is 
approximately equal to 

-E[^-^--/5'^-^-]' + ^E 1/5^1' (4-4) 

j=l k=l 

where i^ is the index of the interval Jj^ in which the variance of e achieves the 
maximum value. This representation implies that the properties of variable 
selection and parameter estimation, such as the selection consistency and 
the Oracle property of the estimator, are the same as those of the standard 
LASSO. So it is unnecessary to restudy these theoretical properties under 
the sublinear expectation framework. However, this representation shows 
that the number of data in each small time slice /j should be relative large. 
Therefore our method only applies to high-frequency data. 

If e possesses both the mean-uncertainty and the variance-uncertainty, 
as was shown in the previous selection, we need the condition E[k] = to 
guarantee the consistency of estimation. Variable selection and parameter 
estimation can be obtained by minimizing the following objective function 

-. n p 

max - J2 [^u - /5'xii - ^] ' + A ^ I /3fc I . (4.5) 

-*-"" j=i fc=i 
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Here n is an initial estimator of n defined by 

1 " 
u = max — > lYij — /3^Xj,- 
^ l<i<m w^ I ' ^ ^ 

i=i 

where (3g may be the solution by minimizing fl4.2p . Denote by Pg the corre- 
sponding solution. Then a prediction of Y is chosen as 

f = /3^x + ^. (4.6) 

Also this prediction achieves the mini-max prediction risk and the prediction 
value tends to be larger. 

5 Simulation study and real data analysis 

5.1 Simulation study 

In this section we present several simulation examples to compare the fi- 
nite sample performances of the sublinear expectation regression proposed 
in this paper with the existing competitors, such as the classical LS regres- 
sion and the LASSO regression. To get comprehensive comparisons, we use 
the mean square error (MSE), maximum prediction error (MPE) and av- 
erage prediction error (APE), together with scatter plots of the estimation 
and prediction, to assess the different methods. From the simulations given 
below, we will get the following findings: (1) The new methods can signif- 
icantly reduce the MPE under all the situations; (2) When the model has 
the mean-certainty, the advantages of the new methods over the classical LS 
methods are not very obvious; (3) For the case of the mean-uncertainty, the 
predictions of the classical LS methods do not work and even collapse nearly, 
but the new methods can get a valid prediction because the impact of the 
mean-uncertainty on the new methods can be successfully eliminated by the 
use of the sublinear expectation of the error. Thus, our proposals are robust 
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to the uncertainties of mean and variance and particularly, for the case of 
the mean-uncertainty, the advantages of ours are rather obvious. 
Experiment 1. We first consider the following simple linear model 

Y = /3iXi + /32X2 + (32X2 + e. 

In the simulation procedure, the regression coefficients are chosen as (ik = 
l,k = 1,2,3, the observation values of X^ are independent and identically 
distributed from N{10,2),k = 1,2,3. We choose e ~ A^({0} x [0,3]), a 
G-normal distribution with certain zero mean. In this case, the model has 
the mean-certainty. The following way is used to generate the data of G- 
normal distribution approximately. Firstly, generate variance values af,i = 
1, • • • ,m, from the uniform distribution f/[0, 3], and then generate the values 
eij,j = 1,- ■ ■ ,n, oi e from the common normal distribution N{0,af). For 
m = 10 and n = 10, the simulation results are reported in Table 1, in which 
MSE, MPE and APE denote the mean squared error, maximum prediction 
error and average prediction respectively; for the definitions of MPE and 
APE see Proposition 2.2. It is clear by the simulation results that the MSE 
and APE of common LS estimation I3ls are significantly smaller than those 
of the G-normal estimation /3g. Such a result is not surprising because, under 
the mean-certainty model, the common LS estimation fS^s is consistent but 
the construction of the new estimation /3g only uses the data in a small time 
interval (essentially, the number of the data used to construct the estimator 
Pg is only 10). On the other hand, the MPE by the new one (3g is significantly 
small than that by the LS estimator Pls, which implies than the new method 
can reduce the maximum prediction risk and therefore is a robust strategy. 
The simulation results above indicate that when model has the mean- 
certainty, the advantages of the new methods over the common LS are not 
rather obvious. Moreover, the new methods even have the disadvantage 
of instability. In the following, we will see that when model has the mean- 
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Table 1: Simulation results of estimation and prediction for Experiment 1 with m = 10 
and n = 10 



MSE 


/3i 


P2 


P2 


MPE 


APE 


Pg 


0.0080 


0.0301 


0.0315 


6.0259 


3.5388 


Pls 


0.0026 


0.0045 


0.0037 


6.6122 


2.8584 



uncertainty, our new methods have rather clear advantages over the LS based 
methods. 

Experiment 2. We reconsider the linear model 

Y = /3iXi + /32X2 + (32X2 + e, 

which is the same in form as in Experiment 1. However, here the model has 
the mean- variance-uncertainty as e ~ ^([3, 5] x [0, 4]). The other experiment 
conditions are designed as Xk ~ N{0, l),k = 1,2,3, m = 10 and n = 20. 
The values of e are generated by the following way. Firstly, the values /ij of 
the mean and the values af of the variance are generated from the uniform 
distributions f/[3,5] and f/[0,4] respectively, and then the values £jj,j = 
1, • • • , n, of e are generated from the common normal distribution N{fii, af) 
for z = 1, ■ ■ ■ , m. The simulation results are listed in Table 2. For the MSE 
of the parameter estimation, the results are similar those in Experiment 1, 
i.e., the MES of the LS estimation is smaller than that of the new estimation 
because the new method only uses the data in a small subinterval in principle. 
However, when the mean-uncertainty and variance-uncertainty appear in the 
model, both the MPE and the APE of the new one are significantly smaller 
than those of the LS estimator. Particularly, the prediction by the LS seems 
to be totally invalid. It indicates that ignoring the model-uncertainty will 
lead to a serious prediction risk. 

Experiment 3. In this experiment, we consider the following high-dimensional 
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Table 2: Simulation results of estimation and prediction for Experiment 2 with m = 10 
and n ^ 20 



MSE 


Pi 


/32 


/32 


MPE 


APE 


/3g 


0.1258 


0.2769 


0.2398 


14.4254 


6.8787 


(3ls 


0.1141 


0.1891 


0.1879 


36.0253 


21.2932 



linear model 

p 

i=i 
In the simulation procedure we choose p = 40, /9j = 1 for 1 < j < 5 and 

Pj = for all j > 6, X ~ iV4o(0,/4o), e ~ N{{0} x [1,4]) and the sample 

size satisfies m = 10 and n = 200. Like the condition in Experiment 1, this 

model has the mean-certainty. We consider the common LS and G-normal 

estimation, as well as use the common LASSO and the G-normal LASSO 

(G-LASSO) defined in Section 4 to select variables and estimate parameters 

simultaneously. The tuning parameter A is determined by the CV. Under the 

above experiment condition, for the common LASSO estimation, the value 

of A is chosen as A^^ = 0.0604; for the G-LASSO, the value of A is chosen 

as \g = 0.3377. The simulation results are reported in Table 3 and Figure 

1. In Table 3, GNR, LSR, Lasso-GNR and Lasso-LSR stand for the G- 

normal regression, LS regression, LASSO-G-normal regression and LASSO- 

LS regression, respectively. The simulation results in Table 3 can verify that 

the new methods can efficiently reduce maximum prediction error. From 

Figure 1 we have the following findings: (1) The the LS methods are more 

stable than the new methods; (2) Like the common LASSO, the G-normal 

LASSO can efficiently select the active variables. 
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Figure 1: The figures of estimation for Experiment 3 with independent co- 

variates. 



Table 3: Simulation results of prediction for Experiment 3 with independent covariates 



Models 


GNR 


LSR 


Lasso-GNR 


Lasso-LSR 


MPE 


4.0365 


4.5443 


4.0221 


4.1206 


APE 


2.5334 


2.0801 


2.2363 


2.0573 



To further examine the behaviors, here we consider the correlated co- 
variates: X ~ A^4o(0, S), where S is 40 x 40 matrix with the (zj)-element 

as 

1, for z = 1, 

0.5, for i 7^ j. 

The other experiment conditions are designed as the same as the above. The 
simulation results are presented in Figure 2. The performances of the figures 
in Figure 2 are similar to those in Figure 1, but they are not as stable as 
before because of the correlation among the covariates. 
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Figure 2: The figures of estimation for Experiment 3 with correlated covari- 
ates. 



Experiment 4- In this experiment, the model is designed as the same in 
form as that in Experiment 3, but the model has both the mean-uncertainty 
and the variance-uncertainty. Formally, the error is distributed as e ~ 
A^([5,10] X [1,4]), which has the mean- variance-uncertainty. We first con- 
sider the simulations for the GNR and the LSR without use of the LASSO, 
the results being reported in Figure 3. Figure 3(i) verifies again that the 
parameter estimation of the LSR is more stable than that of the GNR. On 
the other hand. Figure 3(ii) provides a clear evidence that with the mean- 
variance-uncertainty, the LSR has rather large values of the MPE and the 
APE and therefore the LSR prediction is invalid completely, but the GNR 
can significantly reduce both the MPE and the APE. These results imply 
that under the mean-variance-uncertainty framework, ignoring the mean- 
uncertainty will result in a serious prediction risk, but the new method can 
efficiently reduce prediction risk by the use of the information of the mean- 
uncertainty of the error e. 

Now we consider variable selection and parameter estimation by the 
LASSO. Under the experiment conditions above, we get A^ = 0.6494 and 
Xls = 0.4670 via the CV method. For the Lasso-GNR, the simulation re- 
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Figure 3: The figures of estimation and prediction for Experiment 4. 













- O - lasso pQ 


1.5 






Pg 


1 
0.5 


A 

* • 


K A 


A /V 





u^ 


W^ 


hr^ 


0.5 

1 


¥ 




1 



20 

(i) 




(ii) 



Figure 4: Tfie figures of GNR and Lasso-GNR estimation and prediction for 
Experiment 4. 



suits are given by Figure 4. It shows that the new method can efficiently 
select active variables and at the same time, the prediction risks are rather 
small. For the Lasso-LSR, the simulation results are presented in Figure 
5. By comparing Figure 5 and Figure 6, we have a clear evidence to show 
that the new method can obviously reduce the prediction risk, but the LS 
prediction collapses nearly. 
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Figure 5: The figures of LSR and Lasso-LSR estimation and prediction for 
Experiment 4. 



In short, our methods are robust to the uncertainties of mean and vari- 
ance. Particularly, for the case of serious mean-uncertainty, the classical 
methods may collapse, but our new methods can successfully eliminate the 
impact of mean-uncertainty and construct efficient prediction. The main dis- 
advantage of the new methods is the instability, more precisely, the resulting 
estimation has relatively large variance since the mini-max estimation only 
uses the data in a subinterval, essentially. 

5.2 Real data analysis 

Non-performing loan (NPL) is always an important object to be monitored 
in financial market. To investigate the relationship between the NPL ratio 
and a set of economic indicators, we use our models, together with the new 
estimation methods, to fit the real data published in Vendors Database of 
China (2000-2010). We also compare our fittings with the LS fittings that 
ignore the distribution-uncertainty. According to the indicator system in 
Vendors Database, after the indicators with which the data are incomplete 
are deleted, we choose the following indicators as initial choices: loan-deposit 
Ratio (Xi), capital adequacy ratio {X2), core capital adequacy ratio (^3), 
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liquidity ratio of short-term assets of RMB (X4), liquidity ratio of short-term 
assets of foreign currencies (^5), proportion of loans from other banks (Xq), 
proportion of loans to other banks (X7), ten largest customers loan ratio (Xg), 
single biggest customer loan ratio (Xg) and NPL provision coverage (Xio). 
Because the indicators Xj are percentages, they are transformed to Xj = 
log 1^,_x^. ioT some constants aj > and bj > 1, and then Xj are centralized 
so that the centralized versions of Xj have zero mean. In the following, we still 
use Xj to denote the transformed and centralized indicators for simplicity. 
According the observation time order, the data are decomposed into five sets, 
in which the numbers of valid data are rii = 26, n2 = 25, n^ = 21, n^ = 20 
and ns = 31 respectively. 

From the real data analyses given below, we will have the following find- 
ings: (1) With model-uncertainty technique, the new methods in most cases 
have more efficient fitting than the LS does; (2) Particularly, when the tech- 
nique of mean- variance uncertainty is employed to fit the real data, a more 
precise fitting can be obtained. 

5.2.1 Case 1 (Mean-certainty model) 

We first use a model with mean-certainty to fit the data. 

(1) If the variable selection is not taken into account, by our method of 
variance-uncertainty, we get an empirical model as 

Mg-1 : Y = -0.2602Xi + 0.1922^2 - 0.3953X3 - 0.2513^4 + O.O6O7X5 
-0.1808X6 + 0.0727X7 + 0.4314X8 - O.I503X9 - 0.5656Xio. 

With this model, the maximum prediction error and average prediction error 
have values: 

MPE{Mg-1) = 1.6009, APE{Mg-1) = 0.4632. 



30 



As a contrastive method, the LS is used to build model, the resulting empir- 
ical model has the following form: 

Mls-^ : Y = -0.2590X1 + 0.1843X2 - 0.3972X3 - O.2268X4 + 0.0543X5 
-0.2073X6 + 0.0914X7 + 0.2734X8 - O.O315X9 - 0.5884Xio. 

The corresponding prediction errors have the following values: 

MPE{Mls-^) = 1.4396, APE{Mls-'^) = 0.4323. 

By comparing the prediction errors, we see that in this case our method has 
no advantage over the LS fitting. We will analyze the causes in the following 
studies. 

(2) Since some indicators among the ten economic indicators have clear 
correlation and the number of data is relatively small, the fittings above 
are inefficient. It is necessary to select variables so that the final model is 
parsimonious and workable. Now we use the Lasso, together with variance- 
uncertainty, to build an empirical model, which has the following form: 

Mg-2 : Y = -0.1770X1 - 0.0111X2 - 0.1878X3 
-0.0549X4 + 0.1397X8 - 0.5956X10. 

By this treatment, the prediction errors have the following values: 

MPE{Mg-2) = 0.8443, APE{Mg-2) = 0.3500. 

By use of the Lasso, the inactive predictors are removed from the model, the 
model size is significantly reduced and prediction effectiveness is improved 
clearly. 

If variance-uncertainty is ignored, the Lasso-LS empirical model has the 
following form: 

Mls-2 : Y = -0.0387Xi - 0.0269X2 - 0.0542X3 
+0.0722X8 + 0.0352X9 - O.538IX10, 
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and the corresponding prediction error have the following values: 

MPE{Mls-'2) = 1.0420, APE{Mls-'^) = 0.4258. 

By comparing Mg-2 and Mls-'^i we have a clear evidence that our method 
can reduce prediction errors. 

5.2.2 Case 2 (Mean-variance-uncertainty model) 

We can verify that yU = —0.1833 and JI = 0.1747. Thus, such a mean- 
uncertainty is not ignorable. To improve data fitting, both mean-uncertainty 
and variance-uncertainty are taken into account in the following modeling 
procedure. 

(1) Without use of variable selection, the model with mean-variance- 
uncertainty has the following empirical expression: 

Mg-1 : Y = -0.2315Xi + O.I888X2 - 0.4765X3 - 0.2673^4 + O.OI29X5 - 0.2590^6 

+0.0798X7 + 0.6331X8 - 0.3093X9 - 0.5374X10 + 0.1747. 

This model leads to the prediction errors as 

MPE{Mg-1) = 0.9182, APE{Mg-1) = 0.3837. 

Comparing Mq-I with both Mq-I and M/^^-l, the model Mq-I has the fol- 
lowing two distinctive features: it uses a relatively large value to predict the 
NPL ratio and the prediction errors are significantly reduced. 

(2) By use of the Lasso, the model with mean- variance-uncertainty has 
the following empirical expression 

Mg-2 : Y = -0.0389Xi - 0.0420X2 - O.I309X3 - 0.5108Xio + 0.1747. 

By this treatment, the prediction errors are reduced to 

MPE{Mg-2) = 0.7305, APE{Mg-2) = 0.4362. 
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This model may be the best one among all the models mentioned above 
because it has both the smallest model size and the smallest MPE. 

In short, a flexible model that has mean-variance-uncertainty can rela- 
tively precisely fit the real data and is parsimonious and workable. 

6 Appendix 

6.1 Definition of sublinear expectation 

Let fi be a given set and "H be a linear space of real valued functions defined 
on Q. Suppose that E : "H — )• M satisfies the following properties: for all 

u,v en, 

(i) Monotonicity: liU >V then E[U] > E[V]; 

(ii) Constant preservation: E[c] = c for any constant c; 

(iii) Sub-additivity: E[U + V]<E[U]+ E[V]; 

(iv) Positive homogeneity: E[At/] = AE[t/] for each A > 0. 

Then {Q, "H, E) is called a sublinear expectation space. 
It can be verified that (iii) and (iv) together imply 

(v) Convexity: 

E[aU + (1 - a)V] < aE[U] + (1 - a)E[V] for a G [0, 1]. 

Furthermore, (ii) and (iii) together lead to 
(vi) Cash translatability: 

E[[/ + c] = E[[/] + c for any constant c. 
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6.2 Proofs 

Proof of Proposition 2.1 We only need to prove the second result. It is 
clear that ( 12. 5 p yields 

xE[F|x] = xx'/3. 

Note that the distribution of x is certain. Consequently, 

E{xE[r|x]} = E[xx]/3. 

This implies the second result of the proposition. D 

Proof of Proposition 2.2 We only need to prove the second result. It 
is obvious that by (12.51) we have 

xE[F|x] =xx'/3 + 7Ix 

and consequently 

E{xE[F|x]} = E(xx')/3 + 7I^[x]. 

This implies the second result of the proposition. D 
Proof of Theorem 3.1 It follows from CI that 

1 '^ 

where 5„ is of order Op{l/n) and is free of (3. Consequently, 

1 " 
max - V" £?■ = a1 + 5„. 
l<i<m n ^-^ ■' 
j=i 

Denoted by /3° the true value of /3. Then 



max — > irj,- — oxj,-^ 
l<i<m n ^-^ 



1 " 

1 " 
max - V [el - 2(/3 - /3°)'x,,£,, + (/3 - /3°)'x,,x^(/3 - /3^ 



\<%<m n 
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Note that Xjj,z = 1, ■ ■ ■ ,m,j = 1,- ■ ■ ,n, are independent and identically 
distributed with zero mean and variance af. By comparing the asymptotic 
orders of every terms in the above expression, we have 



max - y~] [Yij - P'^ijf 

l<i<m n ^-^ 
i=i 

1 " 

As was shown that Si^j and 5„ are independent of /3. Thus, to get the estima- 
tor of /3, minimizing max - X]?=i \^i3 ~ P'^ij] is equivalent to minimizing 

l<i<m " -' 
n 

i=i 
We rewrite the above objective function as 






^n(7) = E 

The function Zn{'y) is obviously convex and is minimized at 7„ = y/n^Pc — 
/3°). It follows from the Lindeberg- Feller central limit theorem and CI that 

Z„(7) ^ ^0(7) = -21^'7 + 7'^(xx')7, 

where W ~ A^(0, aj^^-E'(xx')). The convexity of the limiting objective func- 
tion, ^0(7)) assures the uniqueness of the minimizer and consequently, that 

a/^(/3g - /3) = 7n = argminZ„(7) — ^ 70 = argminZo(7). 

(See, e.g.. Pollard 1991, Hj0rt and Pollard 1993, Knight 1998). Finally, we 
see 7o = (-^(xx'))^^!^^ and the result follows. D 

Proof of Theorem 3.2 The definitions of the two estimations lead di- 
rectly to the conclusions of the theorem. D 
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Proof of Theorem 3.3 From the proof of Theorem 3.1 we see that Pg 
is actuaUy the common LS estimator of /3 obtained by data (yj^j,Xj^j), j = 
1, ■ ■ ■ ,n. Thus /3g = /3 + Op{l/n), where (3 is the true regression coefficient 
given by (12. 6p in the mean-certainty model. When -E'[x] = 0, the true re- 
gression coefficients in the mean-certainty model and the mean-uncertainty 
model are the same as given in (12. 6p and (12. 9p . Moreover, by the the same 
argument as used in the proof of Theorem 3.1, we have 
1 " 

l<i<m n •^■"^ l<t<m 

i=i 
The above discussion ensures that 

1 " 
JI = max - y^ [Yij - /3'xjj]+0p(l/n) = max EF^[e]+Op{l/n) = 'JI+Op{l/n) 

l<i<m n ■^— ' l<i<m, 

where Fi is the distribution of data in /,. Consequently, 

max - V" \Yij - /3'xij - /l] = max - V" [Yij - 13'y.ij - /l] + Op{l/n). 

l<i<m 77, •'— ' l<i<m TL •'— ' 

3 = 1 ~ ~ 3 = 1 

On the other hand, 

^ n 1 '^ 

Tl . Tl . 

= a1 + {]!- HiY + Op{l/n). 
Then, 

-L X — ^ -L X — ^ *? 

max - y [Yij - (3'xij -JL] = - > [Yk,j - (3'xk,j -Jl] + Op{l/n). 
i<i<m n '—^' n '—^' 

i=i i=i 

By the above result, -E'[x] =0 and the same argument as used in the proof 

of Theorem 3.1, we can prove the theorem. D 

Proof of Theorem 3.4 The proof of the theorem follows directly from 

the definitions of the two estimators. D 
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